Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(kinesis): rename fields of KinesisOffset and KinesisSplit to make everything explicit #18704

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

stdrc
Copy link
Member

@stdrc stdrc commented Sep 25, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

As part of #18697, this PR revisits the kinesis source connector and makes the offset semantics explicit in KinesisOffset and KinesisSplit. Backward-compatibility is achieved by #[serde(alias = "...")].

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Copy link
Member Author

stdrc commented Sep 25, 2024

@stdrc stdrc changed the title rename fields of KinesisOffset and KinesisSplit tp make everything explicit refactor(kinesis): rename fields of KinesisOffset and KinesisSplit to make everything explicit Sep 25, 2024
@stdrc stdrc marked this pull request as ready for review September 25, 2024 07:37
Comment on lines +90 to +91
next_offset: KinesisOffset::None,
end_offset: KinesisOffset::None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can see from

KinesisOffset::SequenceNumber(seq) => (
Some(seq.clone()),
None,
ShardIteratorType::AfterSequenceNumber,
),

if kinesis read from a position, in current impl, it should begin with the next one.
So next_offset does not mean next offset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AfterSequenceNumber is a way to denote the next one after the given sequence number. I don't see any problem here...

Copy link
Member Author

@stdrc stdrc Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try read this in English: The next_offset is the offset AfterSequenceNumber("abc").

And compare it with previous one: The start_position is the offset SequenceNumber("abc").

Copy link
Contributor

@tabVersion tabVersion Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you want to eliminate the concept misalignment in the sources' split def, it is good for the project.

The variable name ought to describe the value it stores, not involving how other components do with it. But here, it stores the latest successful read offset (and literally the start offset if begin with Timestamp or TrimHorizen). It is not the next offset.
Kinesis does have its own primitive to begin consumption with the offset after the given one, your proposal may not take this for granted.

Try read this in English

Besides this, I know English well and please remain calm and friendly in communications. Be professional, dude.

Copy link
Member Author

@stdrc stdrc Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try read this in English

Besides this, I know English well and please remain calm and friendly in communications. Be professional, dude.

I'm saying that the new semantics is understandable in English, while the previous one is not.

The value stored by start_position: KinesisOffset is not the sequence number, it's After(sequence number). When recovery happens, the start position is the next one after sequence number, so what start_position = SequenceNumber(abc) actually means is start_position = AfterSequenceNumber(abc). The renaming from start_position to next_offset doesn't change any semantics, it doesn't matter. The core change is start_position = SequenceNumber(abc) to start_position = AfterSequenceNumber(abc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your logic, if start_position is last seen offset, then the Earliest and Latest variants should be BeforeEarliest, BeforeLatest. Is it clear enough?

Copy link
Contributor

@tabVersion tabVersion Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your logic, if start_position is last seen offset, then the Earliest and Latest variants should be BeforeEarliest, BeforeLatest. Is it clear enough?

Using sarcasm and rhetorical questions in the workplace is not a friendly communication attitude.

Is it clear enough?

Yes, I fully understand your motivation and proposed changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your logic, if start_position is last seen offset, then the Earliest and Latest variants should be BeforeEarliest, BeforeLatest. Is it clear enough?

Using sarcasm and rhetorical questions in the workplace is not a friendly communication attitude.

Hah? I'm not using any sarcasm and rhetorical question. "Is it clear enough?" just means "这么解释足够清楚了吗?".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't be so sensitive. I'm just trying to explain. I admit that I was a bit impatient at the very beginning of this thread. But I didn't mean to be any aggressive after that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I fully understand your motivation and proposed changes.

Ok, then I think we have reached a consensus.

@stdrc stdrc requested review from fuyufjh and tabVersion September 25, 2024 09:12
Base automatically changed from rc/fix-kinesis-source-creation to main September 25, 2024 10:26
@buildkite buildkite bot requested a review from a team as a code owner September 25, 2024 10:26
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Didn't read the discussion above) The change lgtm

Do I understand correctly:

  • Previous start_position looks obstacle (the "start" word). It means how other code should use it, but seems wrong.
  • If we want to make it more consistent with "store" site, it should be renamed to last_seen_offset. (But seems not OK, since we have "timestamp"/"latest", etc)
  • If we want to make it more consistent with "read" site, call it next_offset = After(offset) looks cool.

@xxchan
Copy link
Member

xxchan commented Sep 25, 2024

Wait, don't next_offset and start_offset mean the same thing..? (whether "start" is inclusive or exclusive is not clear though, (and usually inclusive (so maybe next_offset mean the same as start_offset_exclusive, lol)))

So the real great invention to me is to rename the enum variant to After

@stdrc
Copy link
Member Author

stdrc commented Sep 25, 2024

Wait, don't next_offset and start_offset mean the same thing..? (whether "start" is inclusive or exclusive is not clear though, (and usually inclusive (so maybe next_offset mean the same as start_offset_exclusive, lol)))

So the real great invention to me is to rename the enum variant to After

Exactly. As I said #18704 (comment), renaming from start_position to next_offset is just for perfection. The real change is to change Offset::SequenceNumber(abc) to Offset::AfterSequenceNumber(abc), so that variants of KinesisOffset mean the same thing but only in different types.

@stdrc stdrc added this pull request to the merge queue Sep 25, 2024
Merged via the queue into main with commit e535aa7 Sep 25, 2024
30 of 31 checks passed
@stdrc stdrc deleted the rc/kinesis-offset branch September 25, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants